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DETAILED ACTION 

1. The amendment filed 9/18/2006 has been entered. Claims 1,9,17-23,31,35- 
40,46,48,52 and 53 have been amended. Claims 32-34,45 and 49-51 have been 
canceled. No claims have been added. The specification has been amended. 
Accordingly claims 1-31,35-44,46-48, 52 and 53 are pending in this office action. 



Claim Rejections - 35 USC § 103 

The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 1 02 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 
The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1 , 148 

USPQ 459 (1966), that are applied for establishing a background for determining 

obviousness under 35 U.S.C. 103(a) are summarized as follows: 

1 . Determining the scope and contents of the prior art. 

2. Ascertaining the differences between the prior art and the claims at issue. 

3. Resolving the level of ordinary skill in the pertinent art. 

4. Considering objective evidence present in the application indicating 
obviousness or nonobviousness. 

Claims 1-23,28,29, 35-40, 52,53 are rejected under 35 U.S.C. 103(a) as being 

unpatentable over Wo 03060766 (hereinafter Lind) (art of record) in view of US 



5794236 (hereinafter Mehrle). 
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As for claim 1 Lind discloses: a scoring module determining a score which is 
assigned to at least one concept that has been extracted from a plurality of 
electronically-stored documents (See page 17 lines 20-24 note: definition of document 
corpus) wherein the score is based on at least one of a frequency of occurrence of the 
at least one concept within at least one such document, a concept weight, a structural 
weight, and a corpus weight; (See page 7 lines 20-24) a clustering module forming 
clusters of the documents by evaluating the score for the at least one concept of each 
document for a best to the clusters and assigning each document to the cluster with the 
best fit; and (See page 19 lines 4-10). While Lind does not differ substantially from the 
claimed invention the disclosure of a threshold module dynamically determining a 
threshold for each cluster based on similarities between the documents grouped into the 
cluster and a center of the cluster, and reassigning those documents having similarities 
outside the threshold are not necessarily explicit. Mehrle however does disclose a 
threshold module dynamically determining a threshold for each cluster based on 
similarities between the documents grouped into the cluster and a center of the cluster 
(See column 6 lines 30-42 and column 9 lines 4-7 and column 9 lines 54-62); and 
reassigning those documents having similarities outside the threshold (See column 9 
lines 3-10). It would have been obvious to an artisan of ordinary skill in the pertinent art 
to have incorporated the teachings of Mehrle into the system of Lind. The modification 
would have been obvious because having seeds allows for more efficient clustering and 
document retrieval. 



Application/Control Number: 10/626,984 



Art Unit: 2166 



Page 4 



As for claim 2 the rejection of claim 1 is incorporated, and further Lind discloses: 
the scoring module calculating the score as a function of a summation of at least one of 
the frequency of occurrence, the concept weight, the structural weight, and the corpus 
weight of the at least one concept (See Page 23 lines 1-4). 

As for claim 3 the rejection of claim 2 is incorporated, and further Lind discloses: 
a compression module compressing the score through logarithmic compression (See 
page 17 line 30-34). 

As for claim 4 the rejection of claim 1 is incorporated, and further Lind discloses: 
the scoring module calculating the concept weight as a function of a number of terms 
comprising the at least one concept (See page 21 lines 25-28). 

As for claim 5 the rejection of claim 1 is incorporated, and further Lind discloses: 
the scoring module calculating the structural weight as a function of a location of the at 
least one concept within the at least one such document (See page 18 lines 10-14). 

As for claim 6 the rejection of claim 1 is incorporated, and further Lind discloses: 
the scoring module calculating the corpus weight as a function of a reference count of 
the at least one concept over the plurality of documents (See page 18 lines 19- 21 note: 
this is an inverse weight of the reference count). 
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As for claim 7 the rejection of claim 1 is incorporated, and further Lind discloses: 
the scoring module forming the score assigned to the at least one concept to a 
normalized score vector for each such document, determining a similarity between the 
normalized score vector for each such document as an inner product of each 
normalized score vector, and applying the similarity to the best fit criterion (See page 30 
line 30- page 31 line 1). 

As for claim 8 the rejection of claim 1 is incorporated, and further Lind discloses: 
the clustering module evaluating a set of candidate seed documents selected from the 
plurality of documents, identifying a set of seed documents by applying the score for the 
at least one concept to a best fit criterion for each such candidate seed document, and 
basing the best fit criterion on the score of each such seed document (See page 28 line 
8-16 note representative= seed). 

Claims 9-16 are method claims corresponding to system claims 1-8 respectively, 
and are thus rejected for the reasons set forth in the rejection of claims 1-8. 

Claim 17 is rejected for the same reasons as claim 9. 

As for claim 18 Lind discloses: a scoring module scoring a document in an 
electronically-stored document set comprising: a frequency module determining a 
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frequency of occurrence of at least one concept within a document (See page 18 lines 
1-3); and a concept weight module analyzing a concept weight reflecting a specificity of 
meaning for the at least one concept within the document (See page 25 lines 27-30 
note: rtc(t,c) is a value based on meaning); a structural weight module analyzing a 
structural weight reflecting a degree of significance based on structural location within 
the document for the at least one concept (See page 18 lines 8-13), a corpus weight 
module analyzing a corpus weight inversely weighing a reference count of occurrences 
for the at least one concept within the document (See page 18 lines 19- 21 note: this is 
an inverse weight of the reference count); and a scoring evaluation module evaluating a 
score to be associated with the at least one concept as a function of the frequency, 
concept weight, structural weight, and corpus weight; (See page 21 24-27) While Lind 
does not differ substantially from the claimed invention the disclosure of a threshold 
module dynamically determining a threshold for each cluster based on similarities 
between the documents grouped into the cluster and a center of the cluster, and 
reassigning those documents having similarities outside the threshold are not 
necessarily explicit. Mehrle however does disclose a threshold module dynamically 
determining a threshold for each cluster based on similarities between the documents 
grouped into the cluster and a center of the cluster (See column 6 lines 30-42 ); and 
reassigning those documents having similarities outside the threshold (See column 9 
lines 3-10). It would have been obvious to an artisan of ordinary skill in the pertinent art 
to have incorporated the teachings of Mehrle into the system of Lind. The modification 
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would have been obvious because having seeds allows for more efficient clustering and 
document retrieval. 

As for claim 19 the rejection of claim 18 is incorporated and further Lind 
discloses: the scoring module evaluating the scoire in accordance with the formula Si = 
X fij x cwij x swij x rwij where si comprises the score, fij comprises the frequency, 
0<cwij <1 comprises the concept weight, o <swij <1 comprises the structural weight, 
and 0 < rwij < 1 comprises the corpus weight for occurrence j of concept I (See page 23 
lines 1-4). 

As for claim 20, the rejection of claim 19 is incorporated and further Lindh 
discloses: the concept weight module evaluating the concept weight in accordance with 
the formula: 
Cwij = 0.25 + (0.25 x tij), 1 < tij < 3 

0.25 + (0.25 x [7-tij]) 4<tij < 6 

0.25, tij > 7 (See page 17 lines 30-34) 

As for claim 21, the rejection of claim 19 is incorporated, and further Lindh 
discloses: the structural weight module evaluating the structural weight in accordance 
with the formula: 



Swij= 1.0, if (J = SUBJECT) 
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.8, if (J* HEADING) 

.7, if (J~ SUMMARY) 

.5, if(J * BODY) 

.1, if (J* SIGNATURE) 
where swij comprises the structural weight for occurrence j of each such concept I (See 
page 21 lines 25-29). 

As for claim 22, the rejection of claim 19 is incorporated, and further Lindh 
discloses: the corpus weight module evaluating the corpus weight in accordance with 
the formula: 

Rwij = (T-rii ) A 2 , rij >M 
T 

1.0 rij < M 

Where rwij comprises the corpus weight rij comprises a reference count for occurrence j 
of each such concept I, T comprises a total number of reference counts of documents in 
the document set, and M comprises a maximum reference count of documents in the 
document set (See page 23 lines 20-23). 

As for claim 23, the rejection of claim 19 is incorporated and further Lindh 
discloses: a compression module compressing the score in accordance with the formula 
SI = log(Si +1), where Si comprises the compressed score for each such concept I 
(See page 27 lines 1-7). 
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As for claim 28 the rejection of claim 18 is incorporated, and further Mehrle 
discloses a plurality of candidate seed documents (See column 2 lines 42-46), a 
similarity module determining a similarity between each pair of a candidate seed 
document and a cluster center (See column 8 lines 14-23); a clustering module 
designating each such candidate seed document separated from substantially all cluster 
centers with such similarity being sufficiently distinct as a seed document, and grouping 
each such candidate seed document not being sufficiently distinct into a cluster with a 
nearest cluster center (See column 9 lines 3-10). It would have been obvious to an 
artisan of ordinary skill in the pertinent art to have incorporated the teachings of Mehrle 
into the system of Lind. The modification would have been obvious because having 
seeds allows for more efficient clustering and document retrieval. 

As for claim 29 the rejection of claim 28 is incorporated, and further Mehrle 
discloses: a plurality of candidate seed documents; a similarity module determining a 
similarity between each pair of a candidate seed document and a cluster center; a 
clustering module designating each such candidate seed document separated from 
substantially all cluster centers with such similarity being sufficiently distinct as a seed 
document, and grouping each such candidate set document not being sufficiently 
distinct into a cluster with a nearest cluster center. 
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Claims 35-40 are method claims comprising substantially the same limitation as 
system claims 18-23, and are thus rejected for the reasons set forth in the rejection of 
claims 18-23. 

Claims 46 is a method claims corresponding to system claim 29 and is thus 
rejected for the same reason as set forth in the rejection of claim 29. 

Claim 52 is rejected for substantially the same reasons as claim 35,. 

Claim 53 is an apparatus claim corresponding to method claim 18 and is thus 
rejected for the same reasons as claim 18. 

Claims 24-27 and 41-44 and claims is rejected under 35 U.S.C. 103(a) as being 
unpatentable over Lind as applied to claim 18 and 35 above, and further in view of US 
6675159 (hereinafter Klein) (art of record) 

As for claim 24 the rejection of claim 18 is incorporated, and further Klein 
discloses: a global stop concept vector cache maintaining concepts and terms (See 
column 18 lines 17-20 and See column 14 lines 45-49); and a filtering module filtering 
selection of the at least one concept based on the concepts and terms maintained in the 
global stop concept vector cache (See column 14 lines 45-50). It would have been 
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obvious to an artisan of ordinary skill in the pertinent art at the time of the invention to 
have incorporated the teachings of Klein into the system of Lind. The modification would 
have been obvious because queries and documents are linked in the fact that words are 
the entities that are being processed. Therefore, any transformation capable of being 
made to a query should be able to applied to documents too, this makes all document 
management systems more efficient and easier to maintain. 

As for claim 25 the rejection of claim 18 is incorporated, and further Klein 
discloses: a parsing module identifying terms within at least one document in the 
document set, and combining the identified terms into one or more of the concepts (See 
column 2 lines 53-56). 

As for claim 26 the rejection of claim 25 is incorporated, and further Klein 
discloses: the parsing module structuring each such identified term in the one or more 
concepts into canonical concepts comprising at least one of word root, character case, 
and word ordering (See column 14 lines 63-67). 

As for claim 27 the rejection of claim 25 is incorporated, and further Klein 
discloses wherein at least one of nouns, proper nouns and adjectives are included as 
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Claims 41-44, are method claims corresponding to system claims 24-27, 
respectively and are thus rejected for the same reasons as set forth in the rejection of 
claims 24-27,; 

Claims 30,31,47 and 48 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over Lind and Mehrle as applied to claim 29 above, and further in view of 
Klein. 

As for claim 30 the rejection of claim 29 is incorporated, and further Klein 
discloses: a normalized score vector for each document comprising the score 
associated with the at least one concept for each such concept occurring within the 
document (See column 3 lines 18-21); and the similarity module determining the 
similarity as a function of the normalized score vector associated with the at least one 
concept for each such document (See column 18 lines 23-26). 

As for claim 31 , the rejection of claim 30 is incorporated, and further Klein 

discloses: the similarity module calculating the similarity in accordance with the formula 

coso* ab = (Ss ; Sb) 
SaSb 

Where coso ab comprises a similarity between a document A and a document B, Sa 
comprises a score vector for document A and Sa comprises a score vector for 
document B. 
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Claims 47 and 48 are method claims corresponding to the system of claims 30 
and 31 respectively and are thus rejected for the same reasons as set forth in the 
rejection of claim 30. 

Response to Arguments 

Applicant's arguments filed 9/1 8/2006 have been fully considered but they are 
not persuasive. 

Applicant argues: 

However, Lindh fails to teach or suggest a threshold module dynamically 
determining a threshold for each cluster based on similarities between the documents 
grouped into the cluster and a center of the cluster, and reassigning those documents 
having similarities outside the threshold. Lindh also fails to teach or suggest dynamically 
determining a threshold for each cluster based on similarities between the documents 
grouped into the cluster and a center of the cluster, and reassigning those documents 
having similarities outside the threshold per. 

Examiner responds: 

Examiner is not persuaded. Examiner has pointed to Mehrle for the disclosure of 
the limitations cited above because the disclosure is more explicit in Mehrle. 
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Applicant argues: 

The Lindh-Mehrle combination fails to teach or suggest dynamically determining 
a threshold for each cluster based on similarities between the documents in each 
cluster and the cluster center, and reassigning the documents with similarities outside 
the threshold, per claim 18 rather the threshold taught by Lindh-mehrle is static and 
predetermining by the builder of the system, instead of being dynamically determined. 

Examiner responds: 

Examiner is not persuaded. Examiner is entitled to give claim limitations their 
broadest reasonable interpretation in light of the specification. Interpretation of Claims- 
Broadest Reasonable Interpretation during patent examination, the pending claims must 
be 'given the broadest reasonable interpretation consistent with the specification.' 
Applicant always has the opportunity to amend the claims during prosecution and broad 
interpretation by the examiner reduces the possibility that the claim, once issued, will be 
interpreted more broadly than is justified. In re Prater, 162 USPQ 541,550-51 (CCPA 
1969). In this case as disclosed in Mehrle dynamic threshold evaluations are 
contemplated by the system of Mehrle as well as systems prior to Mehrle (See column 
2 lines 1-3). Mehrle discloses that documents themselves can be used to make the 
threshold values thus resulting in a dynamic determination (See Mehrle column 2 lines 
10-18). 
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Conclusion 

THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time 
policy as set forth in 37 CFR 1.136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1 .136(a) will be calculated from the mailing date of 
the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Leon J. Harper whose telephone number is 571-272- 
0759. The examiner can normally be reached on 7:30AM - 4:00Pm. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Hosain T. Alam can be reached on 571-272-3978. The fax phone number 
for the organization where this application or proceeding is assigned is 571-273-8300. 
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Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). 



LJH 

Leon J Harper 
December 1 , 2006 




